Dissertation / PhD Thesis/Book PreJuSER-10841

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Timestamp Synchronization of Concurrent Events



2010
Forschungszentrum Jülich GmbH Zentralbibliothek, Verlag Jülich
ISBN: 978-3-89336-625-5

Jülich : Forschungszentrum Jülich GmbH Zentralbibliothek, Verlag, Schriften des Forschungszentrums Jülich : IAS Series 4, XVIII, 116 S. () = RWTH Aachen, Diss., 2010

Please use a persistent id in citations:

Abstract: Supercomputing is a key technological pillar of modern science and engineering, indispensable for solving critical problems of high complexity. However, to effectively utilize the enormously complex large-scale computer systems available today, scientists and engineers need powerful and robust software development tools. One technique widely used by such tools is event tracing with a broad spectrum of applications ranging from performance analysis, performance prediction and modeling to debugging. In particular, event traces are helpful in understanding the performance behavior of parallel programs since they allow the in-depth analysis of communication and synchronization patterns. The accuracy of such analyses depends on the comparability of timestamps taken on different processors and may be adversely affected by non-synchronized clocks leading to inaccurate relative event timings. Such inaccuracies may cause a given interval to appear shorter or longer than it actually was, or introduce violations of the logical event order, which requires a message to be received only after it has been sent. Inconsistent trace data may not only lead to false conclusions, for instance, when the impact of communication patterns is quantified, but may also confuse the user of trace-visualization tools by causing message arrows to point backward in time-line views. Even more strikingly, trace-analysis tools may also cease to work in a satisfactorymanner if they rely on the correct order to function properly. Although linear offset interpolation can restore the consistency of the trace data to some degree, time-dependent drifts and other inaccuracies may still disarrange the original sequence of events, as shown in a study conducted as a part of this Ph.D. thesis. The already familiar controlled logical clock algorithm accounts for such violations in point-to-point communication by shifting message events in time as much as needed while trying to preserve the length of local intervals. This algorithm is, however, not suitable for realistic applications because (i) it ignores collective and shared-memory operations and (ii) as a serial algorithm it offers only limited scalability. This thesis addresses these shortcomings by extending the algorithm to restore event semantics related to collective and shared-memory operations and by parallelizing the extended version to make it suitable for large-scale systems including computational grids. The basic idea behind the semantic extension is to consider collective and shared-memory operations as being composed of multiple point-to-point messages, taking the semantics of the different flavors of these operations into account. In order to accomplish the correction in a scalable way, both distributed memory and parallel processing capabilities are exploited by processing separate local trace files in parallel and replaying the original communication on as many CPUs as were used to execute the target application itself. To employ the replay mechanism in computational grids, this work also defines the necessary infrastructure to accurately measure clock offsets in distributed environments with hierarchical networks. The methodology was evaluated in practice by integrating the extended and parallelized algorithm into the Scalasca trace-analysis framework and applied to traces of realistic applications taken on single cluster systems and computational grids. The thesis shows that the algorithm eliminates inconsistent timings of concurrent events while onlymarginally changing the length of intervals between local events – even if wide-area communication is involved. Scalability is demonstrated with up to 4,096 application processes.


Note: Record converted from VDB: 12.11.2012
Note: RWTH Aachen, Diss., 2010

Contributing Institute(s):
  1. Jülich Supercomputing Centre (JSC)
Research Program(s):
  1. Scientific Computing (FUEK411) (FUEK411)
  2. 411 - Computational Science and Mathematical Methods (POF2-411) (POF2-411)

Appears in the scientific report 2010
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Document types > Theses > Ph.D. Theses
Workflow collections > Public records
Institute Collections > JSC
Publications database
Open Access

 Record created 2012-11-13, last modified 2021-01-29


OpenAccess:
Download fulltext PDF
External link:
Download fulltextFulltext by OpenAccess repository
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)